# Multimodal Medical Analysis
Medgemma 4b It
Other
MedGemma is a medical-specific multimodal AI model developed by Google, based on the Gemma 3 architecture, focusing on medical text and image understanding.
Image-to-Text
Transformers

M
unsloth
223
2
Google.medgemma 4b It GGUF
MedGemma-4B-IT is a medical-focused image-to-text generation model developed by Google.
Image-to-Text
G
DevQuasar
6,609
1
Medgemma 4b It
Other
MedGemma is a series of medical multimodal models optimized based on Gemma 3, specifically designed for medical text and image understanding, available in 4B and 27B parameter versions.
Image-to-Text
Transformers

M
google
15.36k
259
Dermatech Qwen2 VL 2B I1 GGUF
This is a multimodal model based on the Qwen2 architecture, focusing on text generation, image-to-text, and visual question answering tasks.
Image-to-Text English
D
mradermacher
60
0
Llama 3.2 11B Vision Radiology Mini
Apache-2.0
A radiology image-assisted interpretation model fine-tuned based on unsloth/Llama-3.2-11B-Vision-Instruct, with optimized runtime speed doubled
Image-to-Text
Transformers English

L
0llheaven
885
1
PULSE 7B
Apache-2.0
A multimodal large language model (MLLM) specifically designed for interpreting electrocardiogram (ECG) images, capable of handling various ECG-related tasks from diverse data sources.
Image-to-Text English
P
PULSE-ECG
21.81k
18
Llava Med V1.5 Mistral 7b
Apache-2.0
LLaVA-Med is a large language-vision biomedical assistant trained through curriculum learning, specifically designed for biomedical visual question answering tasks.
Text-to-Image
Transformers

L
microsoft
75.68k
85
Chinese LLaVA Med 7B
Apache-2.0
A Chinese medical multimodal large language model based on the LLaVA-1.5 architecture, focusing on visual question answering tasks in the medical field.
Text-to-Image
Transformers Chinese

C
BUAADreamer
60
4
Chexpert Mimic Cxr Impression Baseline
MIT
This is a text generation model based on chest X-ray images, capable of generating radiology impression reports from medical imaging.
Image-to-Text
Transformers English

C
IAMJB
52.87k
0
Llava Roco 8bit
BabyDoctor is a multimodal large language model that combines the capabilities of CLiP and LLaMA 2. It can understand and generate text while also comprehending images. The model has been fine-tuned specifically for interpreting radiology images such as X-rays, ultrasounds, MRIs, and CT scans.
Image-to-Text
Transformers English

L
photonmz
29
15
Rclip
Gpl-3.0
RCLIP is a vision-language model fine-tuned from CLIP specifically optimized for medical image analysis in the radiology domain.
Text-to-Image
Transformers English

R
kaveh
42
2
Quiltnet B 16 PMB
MIT
A multimodal foundation model based on ViT-B/16 visual encoder and PubMedBERT text encoder trained on the Quilt-1M pathology video dataset
Image-to-Text
Q
wisdomik
513
5
Quiltnet B 32
MIT
A CLIP ViT-B/32 vision-language foundation model trained on the Quilt-1M pathology video dataset, specifically designed for histological analysis
Text-to-Image
Q
wisdomik
8,442
22
Featured Recommended AI Models